arXiv:1507.05876v4 [math-ph] 13 Jan 2017 


Discrete Analysis, 2016:9,15 pp. 

WWW. discreteanalysisjournal. com 


Self-Similarity in the Circular Unitary 

Ensemble 

Elizabeth S. Meckes Mark W. Meckes 

Received 20 October 2015; Revised 10 June 2016; Published 15 June 2016 


Abstract: This paper gives a rigorous proof of a conjectured statistical self-similarity 
property of the eigenvalues random matrices from the Circular Unitary Ensemble. We 
consider on the one hand the eigenvalues of an n x n CUE matrix, and on the other hand 
those eigenvalues of an mn x mn CUE matrix with 101 < Tz/m, rescaled to fill the unit 
circle. We show that for a large range of mesoscopic scales, these collections of points are 
statistically indistinguishahle for large n. The proof is based on a comparison theorem for 
determinantal point processes which may be of independent interest. 

Key words and phrases: random unitary matrices, Haar measure, eigenvalues, self-similarity, determi¬ 
nantal point processes 

1 Introduction 

The set of A x A unitary matrices is a compact Lie group, and as such, possesses a unique probability 
measure which is invariant under left- and right-translation (called Haar measure). In random matrix 
theory, the unitary group together with Haar probability measure is called the circular unitary ensemble 
(CUE). The word circular refers to the fact that all of the eigenvalues of a CUE matrix lie on the unit 
circle in the complex plane. 

There has long been a folklore conjecture that the distribution of the eigenvalues of a CUE random 
matrix has a self-similar structure. Eor example, in their statistical analysis [3] of CUE eigenvalues and 
zeroes of the Riemann zeta function. Coram and Diaconis hypothesized that the following may hold: 

Conjecture 1. Let U be an N x N random matrix from the CUE with eigenvalues where 

0 < 01 < • • • < ©AT < 2%. Choose an eigenvalue e'^^ uniformly, and let T be the length of the counter¬ 
clockwise circular arc from 6k to 6K+h where the indices are interpreted modulo A. Let (j) G [0,27r) be 
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a uniformly chosen random angle, independent ofU. Ifk and N are both large, then the random set of 
points 



K<j<K+k 


is statistically indistinguishable from the eigenvalues of a kxk random matrix from the CUE. 


That is, a random choice of k sequential eigenvalues of N xN CUE matrix U, rescaled and 
randomly rotated, is indistinguishable from the full set of eigenvalues of&kxk random matrix. 

Aside from statistical evidence for the conjecture, there is a result of E. Rains [15] which is suggestive 
of this kind of self-similarity. Suppose that U is an A x A random CUE matrix, with A = nk; Rains 
proved that the distribution of the eigenvalues of U” is exactly that of the collection of eigenvalues of 
n independent kxk random CUE matrices. That is, wrapping the eigenvalues of U around the circle n 
times produces n independent copies of the k eigenvalues ofakxk random matrix. It is tempting to view 
each of those collections of k eigenvalues as coming from one of the n arcs of the circle that gets stretched 
to cover the circle once (this is not at all the way Rains’ theorem is actually proved). If this intuition were 
correct, it would illustrate exactly the kind of self-similarity conjectured by Coram and Diaconis. 


In this paper, we give a rigorous proof of a version of the self-similarity conjecture. The following 
notation is used throughout. Let U be an n x n random CUE matrix with eigenvalues with 

Qj G [—71, Tt) for each j. (It is a matter of technical convenience to take the arguments of the eigenvalues 
to be in [—71, tt) here instead of in [0,27r) as in Conjecture 1.) For A C [—71,7i), denotes the number 
of eigenangles dj which lie in A; we generally omit the n and write Aa. For 6 G [0, Tt), q] is denoted 
by Afe. For m> I, let be an nm x nm random CUE matrix with eigenvalues {e“l’j}i<j<nm, with 
G [—71,7t) for each j, and let 



m(j),G a}; 

i m mJ ) 


counts the random points in A of the point process consisting of the eigenvalues of in the arc of 
length ^ about 1, and rescaling to fill out the whole circle. While the total number of eigenvalues in this 
arc is random, it concentrates strongly at its expected value of n. In the context of the Diaconis-Coram 
conjecture, our nm plays the role of A and n plays the role of k. 


Theorem 2. Suppose that m,n > 1, and that A C [—71,7i) has diameter diamA < 7i. Then 


drv (:n-a, aJ")) < Wi (Aa, aJ"’) 


^ Umn|A| diamA 
“ 671 


where dTv{-,-) denotes total variation distance between random variables, W\ (•, •) denotes L^-Wasserstein 
distance, and |A| denotes the Lebesgue measure of A. 

For context, recall that EAa = ^; the same is true for . 

In the statement of Theorem 2, and all of the following results, precise constants are included for 
concreteness, with no claims as to their sharpness. The definitions of dTv{-,-) Wi(-, ■) are recalled at 

the end of this section. 
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Theorem 2 was stated with the implicit assumption that m is an integer, since it is in that case that it 
relates directly to Conjecture 1 . However, it is only strictly necessary that mn is an integer, and a slight 
refinement of the proof shows that for any m > 1, if mn G N, then 

drv m < C (^1 - V^|A|diamA. (1) 

In particular, this yields the comparison 


between nxn CUE eigenvalues and (n + 1) x (n + 1) CUE eigenvalues. 

As a consequence of Theorem 2, if {A„} is a sequence of sets such that either diamA„ = or 

\A„\ = as n —)• oo, then 


drv 





^0. 


Thus indeed, a sequential arc of about n of the nm eigenvalues of an nm x nm random matrix is statistically 
indistinguishable, on the scale of for diameter or for Lebesgue measure, from the n 

eigenvalues of an n x n random matrix. 

A remarkable feature of Theorem 2 is that it yields microscopic information even at a mesoscopic 
scale: if ^ ^ diamA„ <C then 2fA„ and both have expectations and variances tending to infinity 
(as follows from Lemma 7 below). One would thus typically try to understand the point processes at 
these scales by studying statistical properties of the recentered and rescaled counts, rather than try to 
observe individual points. Here, we are able to make direct point-by-point comparisons of the two point 
processes treated as discrete objects, with no rescaling or continuous approximations. 

The fact that we are able to compare the two point processes with no rescaling certainly suggests 
that we are witnessing a true self-similarity phenomenon which is a special feature of the structure of 
the eigenvalues of CUE random matrices. However, one should be careful to check that the two point 
processes are not similar simply because they have the same limit. Indeed, Wieand [19] and Soshnikov 
[17] showed that 


Afg — EATq 
VVarNe 


^A(0,1) 


as n —)• oo for fixed 6 ; fhe same then follows for . Eigure 1 gives a convincing visual illustration that 
AIa and resemble each other more closely than either resembles a Gaussian distribution; a rigorous 
proof of this fact is given in Proposition 8 below. 

We conjecture that a comparable result to Theorem 2 holds without the restriction on diamA, and 
that the factor of ^/n in the right hand side is an artifact of our proof; this would imply that AfA„ and Af^™^ 
become indistinguishable as long as |A„| —)> 0. Eor more details, see the remark at the end of Section 
2. On the other hand, we do not expect such a result to hold for sets of constant size; i.e., independent 
of n. Eor example. Rains [14] gives precise asymptotics for Var Afg for n — )■ oo and 6 fixed, which show 
thaf Var Afe and Var Af^™^ are not asymptotically equal. This suggests (but does not formally imply) that 
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Kolmogorov-Smirnov stalistics 

!No .2 to Gaussian 

0.329 

(2) 

'Nq 2 to Gaussian 

0.319 

Xo,2 to Kg 

0.016 


Kolmogorov-Smirnov statistics 

Ko .25 to Gaussian 

0.262 

Kgj to Gaussian 

0.262 

^0.25 to Kgj 

0.04 


{ 2 ) 

Figure 1: Left: Simulated cumulative distribution functions (500 trials) for Xo .2 (red) and !Nq 2 (green), 
with n = 100 . 

(' 2 '} 

Right: Simulated cumulative distribution functions (200 trials) for 2vfo.25 (red) and 25 (green), with 
n = 500. 

The dotted lines show Gaussian cumulative distribution functions with mean equal to the theoretical mean 
of both and (i.e., ^ « 6.4 and ^ 39.8, respectively) and variance equal to the average of the 

two corresponding sample variances. 


Theorem 2 does not hold in this setting. Rains’ estimate does show that Proposition 9 below on the 
asymptotic equality of variances does not extend to that regime. 

We expect that a version of Theorem 2 holds for the other circular ensembles of random matrix 
theory; however, our approach is via the determinantal structure of the eigenvalue process for the CUE, 
which is not present outside the unitary case. 

Finally, some comments on the relationship between Conjecture 1 and Theorem 2 are in order. The 
models of self-similarity being used are not identical; in Conjecture 1, exactly ^ +1 sequential eigenvalues 
are selected and stretched as needed to make the first and last meet, resulting in exactly k random points. 
In Theorem 2, the eigenvalues from an arc making up a fixed fraction of the circle are chosen and 
that arc is stretched (deterministically) to cover the whole circle; the resulting total number of points 
is random. However, in the mesoscopic regime the two models are essentially the same. The idea is 
the following: eigenvalue rigidity (see Lemma 10 of [12]) implies that the difference between the 7 * 
and the (7 + «)'*’ eigenangles of an nm x nm CUE matrix is about ^ q. ( 7 ( v2^) probability. 

So whereas Theorem 2 considers the eigenangles of an nm x nm matrix in an interval of length 6 /m, 
Conjecture 1 suggests considering the eigenangles in an interval whose length is random but typically 
about ^ + But if e ^ :yj= ^ then with extremely high probability an interval of length 

6 contains no eigenangles, and so the corresponding counts are the same. 

The rest of this paper is organized as follows. In Section 2 we give the background and general results 
on determinantal point processes needed to prove Theorem 2, followed by the proof of the theorem and a 
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corollary giving a rate for the classical convergence of the eigenvalue process to the sine kernel process on 
a microscopic scale. In Section 3 we give precise asymptotics for the variances of the counting functions. 
As a consequence, we are able to identify a sharp rate of convergence in the central limit theorem 
mentioned above, which is in particular much slower than the merging of distributions in Theorem 
2. We also show that the variances of the counting functions of the two processes are asymptotically 
equal throughout the entire mesoscopic regime, giving a rigorous proof of another manifestation of the 
self-similarity phenomenon. Finally, Section 4 gives a surprising comparison between the joint intensities 
of the eigenvalues processes for U and . 

We conclude this section with a brief review of the notions of distance used here. The following 
distances can be dehned much more generally, but for our purposes, it suffices to dehne them for 
integer-valued random variables X and Y. 

1. The total variation distance from A to F is defined by 

drviXJ) :=sup|P[AeA]-P[FeA]|. 

ACZ 

2. The L'-Wasserstein distance is defined by 

Wi(A,F) := inf E|Zi-Z 2 |, 

(Zl ,Z2) 

where the infimum is over random vectors (Zi ,Z 2 ) such that Zi has the same distribution as X and 
Z 2 has the same distribution as Y (such a random vector is called a coupling of X and F). 

The Kantorovich-Rubenstein Theorem states that W\ can equivalently be defined as 

Wi(A,F) :=sup|E/(A)-E/(F)|, 
i 

where the supremum is over 1-Lipschitz functions / : Z —M. The distance Wi is a metric for the 
topology of weak convergence plus convergence of absolute first moments. (See [ 18, Section 6 ] for 
a thorough discussion and proofs.) 

Note that an indicator function of a set A of integers is 1-Lipschitz on Z, and so for X and F 
integer-valued, 

dTv{X,Y)<Wi{X,Y). (2) 


2 Determinantal point processes and the proof of Theorem 2 

Let A be a locally compact Polish space. A simple point process on A is a random integer-valued 
(positive) Radon measure z on A, such that the measure of any singleton is at most 1. Alternatively, it 
may be viewed as a locally hnite random set of points in A; if A C A then we write Na = z(^) for the 
(random) number of points lying in A. If A is equipped with a reference Borel measure /i, then the k* 
joint intensity or correlation function : A* —)■ [ 0 ,°°) of % is defined by the equation 

/ pk{xi,...,xk)d}x{x\)...d}x{xk), 

'a, 
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whenever A\,... C A are measurable and pairwise disjoint, assuming that such functions exist. A 
simple point process is called a determinantal point process with kernel A': A^ —)• C if its joint intensities 
exist and 

Pk{xi,... ,Xk) = det [K{xi,Xj)]''^j^-^. 

Note that it is immediate from the definition that the restriction of a determinantal point process on A to a 
measurable subset D C A is again a determinantal point process. 

A kernel A': A^ —)• C defines an integral operator on L^(/i) by 

^(/)W := / K{x,y)f{y)dp{yy, (3) 

Ja 


if K{x,y) = K{y,x), then the operator JC is self-adjoint. It was proved by Macchi [11] and Soshnikov 
[16] that a kernel K which defines a self-adjoint, trace class operator % as above is the kernel of a 
determinantal point process if and only if all of the eigenvalues of % lie in [0,1]. 

For the remainder of this paper, x will denote the point process of eigenvalue angles in [—71, n) of an 
nxn CUE random matrix. For fixed m> 1, lef x^'”'^ denofe fhe poinf process obfained by mulfiplying by 
m those eigenvalue angles of an nm x nm CUE random matrix which lie in ^)- 

It is a fact originally due to Dyson that is a determinantal point process on [—71, tt); it follows easily 
that is as well. The following Proposition gives explicit formulae for the corresponding kernels. 

Proposition 3. The point process on [0,27r) is determinantal with kernel 




1 «n(^) 


with respect to Lebesgue measure. 


Proof. The case m = \ was proved by Dyson in [5] (although that work predates the language of 
determinantal point processes); see also [13, Section 11.1] or [10, Section 5.4]. The general case follows 
from a change of variables which shows that {x,y) = j m) • 

Note in particular that the corresponding operators %n”'^ as defined in (3) are self-adjoint and trace 
class. 

The following general result on determinantal point processes is the main technical ingredient behind 
Theorem 2. 

Proposition 4. Let Af and N be the total numbers of points in two determinantal point processes on 
(A,/t) with conjugate-symmetric kernels K,K G L^(/t (8*/^), respectively. Suppose that Af, Af < N almost 
surely. Then 


<lTi(N,N) < 



K{x,y) - K{x,y) 


2 


dp{x)dp (y). 


Proposition 4 depends on the following remarkable property of determinantal point processes. 


Discrete Analysis, 2016:9, 15pp. 


6 










Self-Similarity in the Circular Unitary Ensemble 


Lemma 5 ([8, Theorem 7]). Consider a determinantal point process with kernel K, whose corresponding 
integral operator % is self-adjoint and trace class, with eigenvalues {Ay}. Let X be the total number of 
points in the process. Then 

j 

where {i^yj are independent Bernoulli random variables with = 1] = Ay and P[i§y = 0] = 1 — Ay. 

Proof of Proposition 4. By (2), it suffices to prove the second inequality. 

Let {Ay} and {Ay} be the eigenvalues, listed in nonincreasing order, of the integral operators % and 
% with kernels K and K respectively. Since 34,114 <N, by Lemma 5, Ay = A/ = 0 for j > N. Let {Ty}^^j 
be independent random variables uniformly distributed in [0,1]. For each j, define 

^j = 1y,<X, and ^J = ^Y,<rxf 

Through Lemma 5, this gives a coupling of !N and and so 


N 


N 




Wi(N,34)<lK 

By the Hoffmann-Wielandt inequality [9, Theorem II.6.11], 


N _ A' 

;=i j=i 


Ay Ay 


< 


1 


N 




(4) 


N 


2 


I 

J=l 

2-j 

< 

x-% 


where || • denotes the Hilbert-Schmidt norm. The result now follows from the general fact that the 
Hilbert-Schmidt norm of an integral operator on L^(/i) is given by the L^(/r (8) ft) norm of its kernel (see 
e.g. [20, p. 245]). □ 


We are now in a position to prove the main theorem. 
Proof of Theorem 2. For every 0 < (p < |, 


1 


(p — -(p < sin(p < msin — < <p, 

6 \mJ 


4> 


and so 


0< 


1 


1 1 

< 


1 




smip msin(^) (p — ^(p^ (p 6 — (p^ 3 

Thus by Propositions 3 and 4, 




mn 


\ {2ny 


. 2 f n{x-y) 

sm ' 


AJA 


1 


1 


sin(V) '^sin(y;) 


dx dy 


<—\ mn[ [(x — yydxdy 
on V JAJA 


(5) 


< —vmn |A| diamA. 
on 


□ 
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The refinement (1) of Theorem 2 follows by using a higher-order Taylor expansion in (5). 

Both and satisfy central limit theorems in the mesoscopic regime (see Proposition 8 and the 
remark which follows). We show in the next section that Theorem 2 does indeed describe a non-trivial 
self-similarity phenomenon on a mesoscopic level, which is not the result of both processes having the 
same limit. 

In the microscopic regime, one can say more. As was first observed in [5], and more clearly spelled 
out in [13], the kernel = A'„ has the following microscopic scaling limit: 


In /2nx 2Tiy\ sin( 7 r(x-y)) 


lim — K, 

n^oo fi 


V ^ 


»- = 


K{x-y) 


( 6 ) 


The same microscopic scaling limit appears for bulk eigenvalues of certain Hermitian random matrices 
as well; see [1,4, 13]. There is sufficient uniformity in the convergence in ( 6 ) to imply that the point 
process X, rescaled to lie in [—n/2,n/2), converges as n —)• oo to an unbounded point process on M which 
is determinantal, with the right hand side of ( 6 ) as its kernel with respect to Lebesgue measure. This 
process is called the sine kernel process; we denote by the number of points of the sine kernel process 
which lie in A C M. In particular, by e.g. [1, Lemma 4.2.48], 25 ^ =► §a- A limited version of Theorem 

' n 

2 can be deduced from the convergence to the sine kernel process. On the other hand. Theorem 2 actually 
improves on the classical microscopic result by estimating a rate of convergence, as follows. 


Corollary 6. Let A C M, and let §a denote the number of points of the sine kernel process which lie in A. 
Then 

5 |A| diamA 




All 


for all sufficiently large n. 


Proof Let n be large enough that A and diamA <!■ Let k > 0. Recall that by definition of 


.( 2 ) 


K 


< 2 ) 




’ 2 *n 


It thus follows from Theorem 2 (with m = 2 and 2^n in place of n) that 

/ \ |A| diamA 2\/27r |A| diamA 

J^2 = 3 ( 2 ^„) 3/2 ’ 

Fixing M G N and applying this estimate for each k G {0,... ,M — 1} then gives that 


^ 2\/27r|A| diamA 1 , c:^ 

^ 3„3/2 L 23V^ +^1 (^2M„,^A>SAj . 


(V) 


As was discussed above, it is well known that Af2M„ 2® ^ Sa as M — 00 . Since all of the Af2M„ 2 ^^ 

’ l^n ‘ l^n 

and Sa are nonnegative random variables with means equal to |A|, weak convergence is equivalent to Wi 
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convergence, and so W\ ( ^2'^n ^,Sa ) —)• 0 as M —)• oo. Thus taking the limit M —oo in (7) yields 




2\/27r |A| diamA ^ 1 ^5|A|diamA 


3n3/2 


k=0 


23k/2 


,3/2 


□ 


Remark. The application of the Cauchy-Schwarz inequality in the last step of (4) in the proof of 
Proposition 4 above is the source of the factor of ^/n in the statement of Theorem 2, which we conjecture 
to be unnecessary. A direct estimate of the quantity 


N 

I 

i=i 




which is bounded by the trace class norm of the difference JC — %, could potentially avoid that dimensional 
factor, thereby increasing the size of the mesoscopic regime in which Theorem 2 gives non-trivial 
information. Unfortunately, trace class norms are considerably more difficult to compute than Hilbert- 
Schmidt norms, and we have not found an estimate which improves on the approach taken above. 


3 Some further asymptotics 

The following lemma gives asymptotics for VarAfg'”^ in various regimes. As was mentioned in the 
introduction, the paper [14] gives precise asymptotics as n —)■ oo for VarAfe when 6 is fixed, buf in the 
present context, estimates for when 6 varies with n are needed. 

Lemma 7. Let m G N be fixed. Whenever ^ < 6 < f, 


Moreover, 


VarXi'” 


VarAf 


(m 


n,Qn 


VarA^”^ - 

^Ind 
y 371 

)■ 

(rJe^-+2 
m) ^ ) 4 

^ ~ [|log(c^/^n6) 

if^<Q<l 
ifJ<e< f. 

n — — 2 

( 0 . f] }. 



Q —)• oo if and only if 

nQn ->oo 


Proof Observe that gj has the same distribution as [_g/„, g/„,] = 'i<mn\-e/m.e/m] \ since m is 

fixed (i.e., independent of n), it therefore suffices to prove the estimates for m = 1. Using general formulae 
for determinantal point processes, it is shown in the proof of [12, Proposition 8] that 


Var Ae 


1 

27r2 


' /■20 zsin2_(f) 
Jo sin^ (l) 


dz + 2d 


M sin^ (f) 
720 sin^ (l) 



( 8 ) 
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Now suppose that < 0 < | 
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then z — I > |, and so 

dz 


Ifz>£> f, 


^20 sin^ 1 

(nz^ 
K 2 ) 


^20 1 — COS^ 1 

:f) 

Je Z 


" J 

e Z 


£ J 


M3) 


y26 sin' 

r2e+f 

/e+f 


^(¥ + f) 


dz 


> 

> 


e y 


Thus 


Now setting e 


log(l)- 
= log(^ 

sin^ I 


Jie 
r^o+-n 1 
Jie z — 


sin^ ( 

:f) 

- dz 


z- 

71 

n 


sin^ ( 

:¥) 

- dz — 

/'20 sin2 (f) 

z- 

71 

n 

/ 7 ^ 

Je Z-- 


sin^ I 


- 


^20 sin^ (f) 


liz. 


lyi 

n 


sin 2 f) 1 fie- 

, -^*>Alog - 

Je z 3 \ e 

y2e-f)>f, 


K ' 
n 


(^)=[s - 3 ] - (■ 


and so 


by (8), 

1 z^i 
VarNe > ' 


. {2ne\ 


1 /■2^zsin^(f) 2 /■2^sin^(f) 1 f 2nd'^ 

-IMW 

For the upper bound, observe that sin (§)> I for 0 < z < TT, thus 

/■^sin^(^) , 

20 / —oW dz<2e -^dz< 7^^ 

220 sin Jie z 


pie 


zsin^ (^) 


/■2® 

dz < 


Tl^n^d^ 


I yt /t <. jL 'yi 0 

/ —^^ 

/o 4 

<e< 

n — — 


sin2(|) 

ny 0, while if ^ < 0 < Tt, then 

/•2e2sin^(f) f^/" K'^n^z k'^ 

/ - 9 dz< — 2 — dz+ — 

Jo sin(|] Jo 4 J 2 /n z 


□ 
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One of the consequences of the lemma is that it allows us to identify the regime in which the (centered, 
normalized) counting function has a Gaussian limit, and to provide the estimates of the rate of convergence 
to Gaussian in that regime given in Proposition 8 below. The real point of the proposition is that the 
convergence of the centered, normalized counting functions of either point process to a Gaussian limit 
is much slower than the merging of distributions given in Theorem 2, meaning that the resemblance 
between and is emphatically not a consequence of the central limit theorem. 

Proposition 8. For 0 < 0 < | and n>l, define 


X, 


n.e 


'^n,e — 


K 




For each n, let 0„ G [O, |]. The sequence {X„ q^} converges weakly to the standard Gaussian distribution 
as n^oo if and only ifndn — )• Moreover, whenever ^ < 6 < n, 


3V2 


32J\og{e^l'^nd) 


< sup 

reK 


^[Xn,6<t]- 




dx 


< 




Remark. For m G N fixed, a central limit theorem for e follows immediately from Proposi- 

” ’ m 

tion 8. 


Proof. First observe that for any integer-valued random variable X with finite second moment, it follows 
from Chebychev’s inequality that 


4 “ ■ 


|X-EX| < 2\/VarX 


^ F[X = k]<m£LxF[X = k](4Vyi 

kez, ^ ^ 

|t:-EX|<2VVarY 


arX 


The cumulative distribution function of X thus has a jump of at least 


16\/VarX 


at some integer, and so 


sup|P[X<t]-P[T<t]| > 

teR 


3 

32VVarX 


for any continuous random variable Y. Now, 


sup 

reR 


^[Xn,e<t] 



sup 

teR 


P [Ne < t] - P [T < t] 


where T is a Gaussian random variable with the same mean and variance as 'Ng. Since 74g is integer¬ 
valued, together with Lemma 7 this proves both the lower bound in the proposition and the fact that 
can only have a Gaussian limit if n6„ —)> oo. 
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For the other estimate, the Berry-Esseen theorem (see, e.g., [6, Theorem XVI.5.1]) implies that if 
are independent random variables in [0,1] and X = Y 4 =\ Yi, then 


sup 

fsR 


p 

A-EX' 

1 P 2 

/ d\ 


[VVarxJ 

y / LtAL 

\/ 2.71 J —oo 


< 


< 


3 

(VarX)3/2 

3 

(VarX)3/2 

3 

VVarX' 


n 

£e(f,--efo" 

i=\ 


By Lenuna 5, this may he applied to X„^g, and so Lemma 7 implies the upper hound in the proposition. □ 


As discussed in the introduction, we conjecture that Theorem 2 holds for any shrinking sequence 
of sets An C [—71,71). We are not able to prove full distributional comparisons for the entire regime; 
however the following result shows that equality of means and asymptotic equality of variances does hold 
throughout the entire mesoscopic regime. That is, if {A„} is any sequence of subsets of [—71,7i) such that 
diamA„ < tt eventually and |A„| —)> 0 (in particular, if diamA„ —)• 0), then 


VarXA„ 


VarM'”' 


0 


as 71 —)■ oo. For context, recall that it has already been shown that Var Xe itself, and thus VarXQ™^ as well, 
is of order log{nd) when 0 > ^. 

Theorem 9. For each m,n> I and A C [—tt, tt), 


EX,4 


If in addition diamA < K, then 

0 < Var Xa - VarM™^ < 

^ 471^ 

Proof. By Proposition 3 and a general formula for the variance of the counting function of a determinantal 
point process (see [7, Appendix B]), 


VarXA-VarXf^ = ^ , 

Ati^JaJa V 2 

As in the proof of Theorem 2, for 0 < (O < |, 

0< ^ ^ 


1 


. 2 (n{x-y) 

sm ' 


sm 




2dn2 (yzl 


771^ sm 


V 2m } 


dx dy. 


< 


sin^cp 77i2sin2(^) (p 




from which the result follows. 


□ 
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4 Comparison of joint intensities 

We conclude with the surprising fact that the joint intensities of the process are always larger than 
those of the eigenvalue process X'^ the implications of this observation remain mysterious (at least to us). 

Proposition 10. For each m, n, and k, let : [0,27r)^ —)• M denote the joint intensity of the determi- 
nantal point process X, tind let denote the k}^ joint intensity of the determinantal point process X^'"^- 
Then for each xi,... G [0,27r), 


P^^\x\,...,Xk) >Pk{xi,...,Xk). 

Proof For this proof we use a different kernel which also generates the point process x (see [5] or [10, 
Section 5.2]) : 

i=o 

which, hy the same change of variables used in the proof of Proposition 3, implies that is generated 
by the kernel 






1 mn—\ 


1 


m 


J(mq+p)(x-y)/m 


m—\n—\ 

II 

p—0q—Q 
1 m—\ 


m 


P=o 


It follows that 

1 m—1 

[rr’(iy,x,= - E 

p“t) 

where D = diag(e“'is a diagonal unitary matrix, and so by Minkowski’s determinant 
inequality [2, Corollary II.3.21], 


{pf^{xi,...,xk)y''' = ( det 


i/k 


m—\ 


- £ DP[Tn{xj,x^)]{p- 
nt ““n 


,P)* 


p=0 


l/k 


m—1 


> - £ (det(DPlT„(xj,x,)](DPr)) 

yyi _n 


i/k 


p=0 


1 m—\ 

= - £ {A^t%{xpxk)]f^ 

=Pk{xi,...,xky^^. 


□ 
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